Expanded vector space model based on word space in cross media retrieval of news speech data

نویسندگان

  • Seiichi Takao
  • Jun Ogata
  • Yasuo Ariki
چکیده

News On Demand System using speech technology usually employs automatic speech transcriptions to retrieve the news data. In the retrieval, users specify a few keywords or sentences as a query and the related news data can be retrieved using the speech transcription. However when users can’t give a query clearly, a video shot of news program which users are watching will become a good query to retrieve the related news data. As one of such kinds of news data retrieval, we propose here to employ video captions as a query and to retrieve the related news data using speech transcription. We call this kind of retrieval as cross media retrieval due to its media cross over. Conventionally available method in cross media retrieval is standard cosine measure in vector space model. In this conventional method, there is a problem of impossibility of semantic level retrieval. To solve this problem, we propose here an expanded vector space model based on a word space. Experimental results found that the expanded vector space model based on the word space has superiority to the conventional vector space model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Multi-scale document expansion in English-Mandarin cross-language spoken document retrieval

This paper presents the application of document expansion using a side collection to a cross-language spoken document retrieval (CL-SDR) task to improve retrieval performance. Document expansion is applied to a series of EnglishMandarin CL-SDR experiments using selected retrieval models (probabilistic belief network, vector space model, and HMM-based retrieval model). English textual queries ar...

متن کامل

Continuous word representation using neural networks for proper name retrieval from diachronic documents

Developing high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. One approach for increasing the vocabulary coverage of a speech transcription system is to automatically retrieve new proper names from contemporary diachronic text documents. In recent years, neural network...

متن کامل

Thematic indexing of spoken documents by using self-organizing maps

A method is presented to provide a useful searchable index for spoken audio documents. The task diiers from the traditional (text) document indexing, because large audio databases are decoded by automatic speech recognition and decoding errors occur frequently. The idea in this paper is to take advantage of the large size of the database and select the best index terms for each document with th...

متن کامل

A prosody-based vector-space model of dialog activity for information retrieval

Search in audio archives is a challenging problem. Using prosodic information to help find relevant content has been proposed as a complement to word-based retrieval, but its utility has been an open question. We propose a new way to use prosodic information in search, based on a vector-space model, where each point in time maps to a point in a vector space whose dimensions are derived from num...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000